Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Communication code generation for automatic parallelization of irregular loops
FU LiGuo YAO Yuan DING Rui
Journal of Computer Applications    2014, 34 (4): 1014-1018.   DOI: 10.11772/j.issn.1001-9081.2014.04.1014
Abstract370)      PDF (791KB)(371)       Save

Irregular computing exists in large scale parallel application widely and the automatic parallelization on distributed memory is hardly to generate parallel code for irregular loops at compile-time. The communication code of the parallel code influences the correctness and the efficiency to the runout of the program. It could automatically generate useful communication code for a common class of irregular loops at compile-time by using the approach of partial communication redundancy, that needed analyzing the array redistribution graph of the program to maintain the producer-consumer relation of irregular array references. The approach searched the local definition set of the irregular array on each processor by computation decomposition and accessed expression of array references as the communication data set, then analyzed the communication strategies for such irregular loops and generated the corresponding communication code. The experimental results show the validity of the approach and the expectant speedup of test applications.

Reference | Related Articles | Metrics
Parallel cost model for heterogeneous multi-core processors
HUANG Pinfeng ZHAO Rongcai YAO Yuan ZHAO Jie
Journal of Computer Applications    2013, 33 (06): 1544-1547.   DOI: 10.3724/SP.J.1087.2013.01544
Abstract640)      PDF (634KB)(766)       Save
The existing parallel cost models are mostly devised for shared memory or distributed memory architecture, thus not suitable for heterogeneous multi-core processors. In order to solve the problem, a new parallel cost model for heterogeneous multi-cores was proposed. It described the impact of computing capacity, memory access delay and data transfer cost on parallel execution time of loops quantitatively, thus improving the veracity of accelerated parallel loop recognition. The experimental results show that the proposed model can effectively recognize the accelerated parallel loops. Using its recognition results to generate parallel codes can improve the performance of parallel programs on heterogeneous multi-core processors significantly.
Reference | Related Articles | Metrics
Superword level parallelism instruction analysis and redundancy optimization algorithm on DSP
SUO Wei-yi ZHAO Rong-cai YAO Yuan LIU Peng
Journal of Computer Applications    2012, 32 (12): 3303-3307.   DOI: 10.3724/SP.J.1087.2012.03303
Abstract978)      PDF (760KB)(578)       Save
Today, SIMD (Single Instruction Multiple Data) technology has been widely used in Digital Signal Processor (DSP), and most of the existing compilers realize automatic vectorization functions. However,the compiler cannot support SIMD auto-vectorization with the feature of DSP, because of DSP complex instruction set, the specific addressing model, the obstacle of dependence relation to vectorization non-aligned data or other reasons. In order to solve this problem, in this paper, for the automatic vectorization in the Superword Level Parallelism (SLP) based on the Open64 compiler back end, the instruction analysis and redundancy optimization algorithm were improved, so as to transform more efficient vectorized source program. The experimental results show that the proposed method can improve DSP performances and reduce power consumption efficiently.
Related Articles | Metrics